mlr: A new Package to conduct Machine Learning Experiments in R

نویسندگان

  • Bernd Bischl
  • M J A Eugster
  • F Leisch
  • R Kohavi
  • M Schmidtberger
  • M Morgan
  • D Eddelbüttel
  • H L Yu
  • L Tierney
  • U Mansmann
چکیده

The mlr package [1] provides a generic, object-oriented interface to many machine learning methods in R for classification and regression and can easily be extended with further ones. It enables the researcher to rapidly conduct complex experiments or implement his own meta-methods using the package’s building blocks. Resampling like cross-validation, bootstrapping and subsampling are used to assess the generalization performance. Hyperparameters of learners can be tuned by grid search or more sophisticated deterministic or stochastic search methods like e.g. Nelder-Mead or CMA-ES [3]. The same holds true for variable selection. Here, mainly the wrapper approach [4] is currently implemented, but the package will be extended with filters and a combination of the two. Benchmark experiments with two levels of resampling, e.g. nested crossvalidation, can be specified with few lines of code to compare different classes of learning algorithms. An interface to the benchmark package by Eugster and Leisch [2] is provided, which enables exploratory and inferential analysis of the results. Parallel high-performance computing is supported through the snowfall, nws andmulticore packages [5]. Experiments can be converted to parallelized versions with a simple configuration command, without touching any further code. The job granularity of scheduled tasks can be changed, so jobs don’t complete too early, providing a better scale-up for problems of different sizes. The talk will include short use cases to explain the package and its optimization algorithms, some general remarks about its programming methodology and a live R demo.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

mlr: Machine Learning in R

The mlr package provides a generic, object-oriented, and extensible framework for classification, regression, survival analysis and clustering for the R language. It provides a unified interface to more than 160 basic learners and includes meta-algorithms and model selection techniques to improve and extend the functionality of basic learners with, e.g., hyperparameter tuning, feature selection...

متن کامل

OpenML: An R Package to Connect to the Networked Machine Learning Platform OpenML

OpenML is an online machine learning platform where researchers can easily share data, machine learning tasks and experiments as well as organize them online to work and collaborate more efficiently. In this paper, we present an R package to interface with the OpenML platform and illustrate its usage in combination with the machine learning R package mlr (Bischl et al, 2016). We show how the Op...

متن کامل

OpenML: An R Package to Connect to the Machine Learning Platform OpenML

OpenML is an online machine learning platform where researchers can easily share data, machine learning tasks and experiments as well as organize them online to work and collaborate more efficiently. In this paper, we present an R package to interface with the OpenML platform and illustrate its usage in combination with the machine learning R package mlr (Bischl et al, 2016). We show how the Op...

متن کامل

Spatiotemporal Estimation of PM2.5 Concentration Using Remotely Sensed Data, Machine Learning, and Optimization Algorithms

PM 2.5 (particles <2.5 μm in aerodynamic diameter) can be measured by ground station data in urban areas, but the number of these stations and their geographical coverage is limited. Therefore, these data are not adequate for calculating concentrations of Pm2.5 over a large urban area. This study aims to use Aerosol Optical Depth (AOD) satellite images and meteorological data from 2014 to 2017 ...

متن کامل

A comparison of machine learning regression techniques for LiDAR-derived estimation of forest variables

Light Detection and Ranging (LiDAR) is a remote sensor able to extract three-dimensional information. Environmental models in forest areas have been benefited by the use of LiDAR-derived information in the last years. A multiple linear regression (MLR) with previous stepwise feature selection is the most common method in the literature to develop those models. MLR defines the relation between t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010